Skip to content

Conversation

@JacquelineGrecco
Copy link

@JacquelineGrecco JacquelineGrecco commented Oct 15, 2025

Pull Request

Description

Terraform module for deploying Databricks workspace on AWS with Backend PrivateLink and Customer-Managed Keys (CMK). This includes VPC configuration, KMS encryption, and Unity Catalog setup with self-assuming IAM roles for external locations.

Confluence Card

Category

  • core-platform
  • data-engineering
  • data-governance
  • data-warehousing
  • genai-ml
  • launch-accelerator
  • workspace-setup

Type of Change

  • New project
  • Bug fix
  • Enhancement
  • Documentation

Project Details

Project Name: AWS Backend PrivateLink with CMK deployment
Purpose: Provide a production-ready Terraform template for deploying secure Databricks workspaces on AWS with:

  • Private network connectivity (no public internet exposure for control plane traffic)
  • Customer-managed encryption keys for enhanced security and compliance
  • Unity Catalog for data governance
  • Support for both greenfield (new resources) and brownfield (existing infrastructure) deployments
    Technologies Used:
  • Terraform (>= 1.5.0)
  • AWS Provider (~> 6.0)
  • Databricks Provider (>= 1.30.0)
  • AWS Services: VPC, PrivateLink, KMS, IAM, S3

Testing

  • Code runs without errors
  • Documentation is complete
  • Used only synthetic data

Security Compliance ✅

  • No customer data, PII, or proprietary information
  • No credentials or access tokens
  • Only synthetic data used
  • Third-party licenses acknowledged

By submitting this PR, I confirm I have followed the CONTRIBUTING.md guidelines and security requirements.

@haleyyyblue
Copy link
Collaborator

Hey @JacquelineGrecco ! 👋
Great work on this PR! The PrivateLink + CMK implementation is solid. I have a few suggestions to better align with our repository philosophy:
🎯 Key Feedback
1. Folder Naming Convention
Our README suggests this pattern:
Current: aws-pl-back-cmk
Consideration: The network type (BYOVPC) isn't clear in the name.
Options:
aws-byovpc-backend-pl-cmk (more explicit)

2. Scenario-Based Architecture
Our repo follows a one folder = one scenario approach. The folder name aws-pl-back-cmk declares this scenario always includes CMK.
Issue: create_new_cmk = false lets users skip CMK entirely, which conflicts with the scenario name.
Suggestion:
Remove create_new_cmk boolean variable
Keep just cmk_arn (optional) for using an existing CMK:
Customers who don't want CMK should use a different scenario folder.

3. Remove Unity Catalog
Unity Catalog belongs in data-governance/ as a separate scenario. Removing it will:
Cut ~35% of code (~350 lines)
Make this scenario focused on "secure workspace setup"
Simplify for customers who just want workspace + PrivateLink + CMK

4. Reduce Modules
README says: "Minimal modularization" and "Avoid complex module abstractions"
Keep: aws-network/ module (VPC conditional logic is genuinely complex)
Flatten into main files:
aws-cmk/ → cmk.tf
aws-iam/ → iam.tf
aws-storage/ → storage.tf
aws-unity-catalog/ → (remove)
Why: These resources don't have complex conditionals. Customers should easily find and customize IAM policies, KMS settings, and S3 configs without navigating modules.

📝 Minor Items

  • Remove GITHUB_ACTIONS.md reference in README (file doesn't exist)
  • Add stronger warning to IAM policy (it's very permissive with s3:, ec2:, etc.)
  • Add brief comments to time_sleep resources explaining why

✅ What's Great
Clean VPC conditional logic
Good validation checks
Both tfvars examples
Proper dependencies and waits
💬 Summary
4 main changes:
Consider folder name alignment with convention
Remove create_new_cmk variable (CMK is always part of this scenario)
Remove Unity Catalog (separate scenario)
Flatten 4 modules → just keep network module
This will make the code ~35% shorter and much easier for customers to customize!
Happy to discuss or pair on this if helpful. Great work! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants